Deep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones
نویسندگان
چکیده
The performance of many noise-robust automatic speech recognition (ASR) methods, such as vector Taylor series (VTS) feature compensation, heavily depends on an estimation of the noise that contaminates speech. Therefore, providing accurate noise estimates for this kind of methods is crucial as well as a challenge. In this paper we investigate the use of deep neural networks (DNNs) to perform noise estimation in dual-microphone smartphones. Thanks to the powerful regression capabilities of DNNs, accurate noise estimates can be obtained by just using simple features as well as exploiting the power level difference (PLD) between the two microphones of the smartphone when employed in closetalk conditions. This is confirmed by our word recognition results on the AURORA2-2C (AURORA2 2 Channels Conversational Position) database by largely outperforming singleand dual-channel noise estimation algorithms from the state-of-the-art when used together with a VTS feature compensation method.
منابع مشابه
A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition
The inclusion of two or more microphones in smartphones is becoming quite common. These were originally intended to perform noise reduction and few benefit is still being taken from this feature for noise-robust automatic speech recognition (ASR). In this paper we propose a novel system to estimate missing-data masks for robust ASR on dual-microphone smartphones. This novel system is based on d...
متن کاملFactored Deep Convolutional Neural Networks for Noise Robust Speech Recognition
In this paper, we present a framework of a factored deep convolutional neural network (CNN) learning for noise robust automatic speech recognition (ASR). Deep CNN architecture, which has attracted great attention in various research areas, has also been successfully applied to ASR. However, to ensure noise robustness, since merely introducing deep CNN architecture into the acoustic modeling of ...
متن کاملDiscriminative Methods for Noise Robust Speech Recognition: a Chime Challenge Benchmark
The recently introduced second CHiME challenge is a difficult two-microphone speech recognition task with non-stationary interference. Current approaches in the source-separation community have focused on the front-end problem of estimating the clean signal given the noisy signals. Here we pursue a different approach, focusing on state-of-the-art ASR techniques such as discriminative training a...
متن کاملModel-based independent component analysis for robust multi-microphone automatic speech recognition
In this communication, we present a method for noise-robust multimicrophone automatic speech recognition (ASR). It is assumed that the speech source to be recognized is recorded with several microphones in a noisy acoustic environment. The proposed method estimates the short-term subband energies (as they are needed for computing the ASR front-end) of the clean speech source from the ones of th...
متن کاملIs speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
Using deep neural networks (DNNs) for automatic speech recognition (ASR) has recently attracted much attention due to the large performance improvement they provide for a variety of tasks. DNNs are known to be robust to overfitting and to be able to remove speaker variability. Another important cause of variability in speech is the presence of noise. A lot of research has been undertaken on noi...
متن کامل